Entry Name: SMU-KAM-MC1

VAST Challenge 2015
Mini-Challenge 1

 

 

Team Members:

Kam Tin Seong, Singapore Management University, tskam@smu.edu.sg (Professor)

Cai Yuanchao, Singapore Management University, yuanchaocai.2014@mitb.smu.edu.sg

Tay Hui Leng Karen, Singapore Management University, karen.tay.2014@mitb.smu.edu.sg

Budi Winarto, Singapore Management University, budiwinarto.2014@mitb.smu.edu.sg

Student Team:  Yes

 

Did you use data from both mini-challenges? No

 

Analytic Tools Used:

Tableau

JMP Pro

 

Approximately how many hours were spent working on this submission in total? ±100 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? Yes

 

Video Download

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


 

Questions

MC1.1Characterize the attendance at DinoFun World on this weekend. Describe up to twelve different types of groups at the park on this weekend. 

a.       How big is this type of group?

b.      Where does this type of group like to go in the park?

c.       How common is this type of group?

d.      What are your other observations about this type of group?

e.      What can you infer about this type of group?

f.        If you were to make one improvement to the park to better meet this group’s needs, what would it be?

Limit your response to no more than 12 images and 1000 words.

First, we aggregated each day’s data at the visitor level, summing up the total number of check-in by Ride Type and Location (how the Ride Type and Location are derived is explained in the answer to Question 2). We also captured the time of arrival and departure so as to derive the time spent per day at the theme park. We then merged the 3 days of data into 1 dataset, and introduced a new column “Repeat Visitors” to indicate whether the visitor visited only on Fri, Sat or Sun, or “Fri and Sat”, “Sat and Sun”, and “Fri, Sat and Sun”. The total number of check-ins by Ride Type were calculated by summing up the corresponding figures over the 3 days. Similarly, the ratio of the total number of check-ins by Ride Type was derived by dividing the absolute figures by the total number of check-in.

We used hierarchical clustering in JMP and make use of the dendrogram and heatmap generated from the clustering results to identify the distinct features of each cluster. We also used the slider available in JMP where we can vary the number of clusters generated. After which, we made use of the coordinated link view where we can select a specific cluster and look at the histogram of the various variables in order to identify the distinct features of each cluster.

We approached the write-up in this manner: Identify general patterns in a big cluster, before we zoomed into a subset of the big cluster so as to identify more unique features.

Results of Hierarchical Clustering  

Big Group 1

 

Fig 1-1. Hierarchical Cluster of Big Group 1

a. The group size ranges from 1 to 10.

Fig 1-2. Group sizes found in Big Group 1

b. They have preference for Trill Rides and they take Rides for Everyone, but not as many times as Thrill Rides. They don’t take Kiddie Rides.

Fig 1-3. Distribution of Total check-in over 3 days (breakdown by Ride Type)

c. They have 793 such groups.

d. Most of them check in the morning.

e. They are Adrenaline Seekers.

f. To make an improvement, the theme park can offer the adrenaline seekers the priority to go Rides for Everyone or kiddie Rides if they take more than 5 times Thrill Rides. In doing this way, they can spread out the traffic.

 

Group 1a

Fig 1a-1. Hierarchical Cluster of Group 1

  1. Over half the groups are loners i.e. their group size is 1. The 75th percentile to 97.5 percentile ranges from a group size of 3 to 7.  The maximum group size is 8.

Fig 1a-2. Group sizes found in Group 1

  1. Group 1 like to go for Thrill Rides; they hardly take Kiddie Rides. Though they do take Rides for Everyone, it’s between 2 to 3 times lesser than that of Thrill Rides.

Fig 1a-3. Distribution of Total check-in over 3 days (breakdown by Ride Type)

 

  1. There are 106 such groups.

 

  1. They entered the theme park in the morning. And they take more than 40 rides per visitor. Most of them are repeat visitors, and about 75% of them spend 3 days in the theme park.

 

Fig 1a-4. Other observations about Group 1

  1. Group 1 are the “Loner Adrenalin Seekers”, and there are no young children in the group.

 

  1. Though the app can currently send “friend” invitations, the theme park could enhance the app such that such loners can identify themselves as part of the “Loner Adrenalin Seekers” and easily send “friend” invitations to seek out other group members to find friends of common interests.

 

 


 

Group 1b

Fig 1b-1. Hierarchical Cluster of Group 1b

  1. Group size ranging from 1 to 4 falls under the 90th percentile, with a maximum group size of 8

Fig 1b-2. Group sizes found in Group 1b

 

  1. They take mostly Thrill Rides. They seldom go for Rides for Everyone and hardly take Kiddie Rides. They don’t go for Shows & Entertainment at all.

Fig 1b-3. Distribution of Total check-in over 3 days (breakdown by Ride Type)

 

  1. There are 57 such groups.

 

  1. Most of them spend one day in the theme park and they visit on Sunday. The number of total rides taken ranges from 3 to 33. Most of the Sunday crowd arrives in the theme park in the morning, though there are some visitors who arrived in the late afternoon or evening time.

 

 

Fig 1b-4. Other observations about Group 1b

 

  1. These are the “Extremists”, and their sole focus is on Thrill Rides.

 

  1. The park operator could encourage such “Extremists” to take more of Rides for Everyone by offering them FastPass if they take more than 3 consecutive Thrill Rides.

 

Group 1c

 

Fig 1c-1. Hierarchical Cluster of Group 1c

  1. The group size ranges from 1 to 8, with slightly more than half with group size of 1.

Fig 1c-2. Group sizes found in Group 1c

 

  1. Though Group 6 shows a high preference for Thrill Rides, more than half took between 35 to 40 Total Thrill Rides. The next preferred rides is Rides for Everyone, but more than half took between 16 to 18 Total Rides for Everyone. This group does not take Kiddie Rides.

Fig 1c-3. Distribution of Total check-in over 3 days (breakdown by Ride Type)

 

  1. There are 44 such groups.

 

  1. They spend more than 1 day in the theme park. The total check-in ranges from 56 to 99. They are the early birds, arriving at the theme park before 9am.

 

Fig 1c-4. Other observations about Group 1c

  1. Though this group shows a high preference for exciting rides, in terms of absolute numbers of check-in, they are on the lower range when compared to Group 1.

 

  1. Since this group likes to take a more leisurely pace, shopping or dining vouchers could be provided to encourage them to spend more at the retail shops or dining places.

 

Big Group 2

 

a. The group size ranges from 1 to 42.

b. They prefer Trill Rides and take Rides for Everyone and Kiddie Rides as well. And they like to go shows compared with Big Group 1.

c. They have 513 such groups.

d. Most of them check in this the morning and spend one day in the park; the total check-in times are around 18.

e. They are probably Family groups

f. If you were to make one improvement to the park to better meet this group’s needs, what would it be?

 

Group 2a

 

a. The group size ranges from 1 to 11, with most of them having a group size less than 5.

b. Group 4 has an equal preference for both Thrill Rides and Rides for Everyone, and a slightly lower preference for Kiddie Rides. The total rides by ride type (absolute) are on the lower range however.

c. There are 35 such groups.

d. Most of them spend 1 day in the theme park, and arrived in the morning.

 

 

e. These could the “Nuclear Family” group with 2 adults with young children. They are not really active as their total number of check-in is on the lower range compared to other groups.

f. The park could offer bundled tickets with 2 adult ticket and 2 child tickets, which is cheaper than buying them individually.

 

Big Group 3

a. The group size ranges from 1 to 44.

b. They prefer to go for Thrill Rides and have the same preference to Rides for Everyone and Kiddie Rides.         


c. There are 510 such groups.

d. Most of them check in in the morning and spend one day in the theme park.

e. What can you infer about the group?

f. If you were to make one improvement to the park to better meet this group’s needs, what would it be?

Group 3a

Fig 3a-1. Hierarchical Cluster of Group 3a

  1. The group size ranges from 1 to 8.

Fig 3a-2. Group sizes found in Group 3a

 

  1. This group does not go for any rides or shows.

Fig 3a-3. Distribution of Total check-in over 3 days (breakdown by Ride Type)

 

  1. There are 34 such groups.

 

  1. 1 group of 8 are repeat visitors. The rest visited only for 1 day. Most of them arrived at the theme park in the morning even though they are not taking any rides.

Fig 3a-4. Other observations about Group 7

  1. These are probably the “Senior Citizens” group, where they generally take a walk around the theme park. They could also come with their younger family members where the younger family members were taking the rides, and hence they were not identified in the same group as the senior citizens group.

 

  1. The park operator could provide benches around the theme park, so that the senior citizens group can take a rest whenever they are tired.

 

 

 

 

MC1.2 – Are there notable differences in the patterns of activity on in the park across the three days?  Please describe the notable difference you see.

 

Limit your response to no more than 3 images and 300 words.

 

The 3 days of data were concatenated into 1 dataset, so that we can check for different patterns across the 3 days. The horizontal axis shows the timestamp, grouped by the day of week, and the hour of day. The vertical axis shows the distinct count of visitors. We used the filter to exclude movement type records. The X,Y coordinates were concatenated together so that we can match each check-in record to a specific Ride Type and the Location. We also made use of the Quick Filter in Tableau so that we can check the patterns pertaining to each Ride Type.

 

On Fri and Sat, we can see that there are 2 shows at Grinosaurus Stage, 10am and 3pm. However, on Sun (8 Jun), there was only 1 show at 10am. From this, we can deduce that the crime happened on Sun between 9am to 10am. This will be investigated in detail in the answer to Question 3.

Another observation is that there are no visitors at Creighton Pavilion at 10am and 3pm, as the Pavilion was closed when the show starts at Grinosaurus Stage.

 

 

Looking at the Thrill Rides across the 3 days, all the Thrill Rides follow a similar pattern across the 3 days, except for TerrorSaur ride. We can see that TerrorSaur has a similar pattern as the rest of the Thrill Rides on Friday. However, it follows a different pattern on Sat and Sun, where the visitor count was much lower compared to the other rides. We observed a plateau in visitor count for TerrorSaur ride on Sat and Sun between 10am to the evening time (approx. 5pm), compared to Fri data. As we have a high number of repeat visitors (approximately 4 in 10 are repeat visitors), we deduce that TerrorSaur follows a similar pattern as the other rides on Friday, as the visitors were trying out all the rides. However, on Sat and Sun, these repeat visitors opt for the rides they find more interesting, and they did not find TerrorSaur ride very interesting. Hence Terror Saur ridership was much lower on Sat and Sun compared to the other rides.

 


 

The above image shows that there is a spike in visitors seeking information and assistance at 9am and 1pm, corresponding to the usual peak at opening time and showtime at 3pm respectively. This is probably due to the visitors asking for directions on how to get to the Grinosaurus Stage and what time does the show start. The spike is worse on Friday 1pm, as this is the first day of the show.

 

 

 

 

MC1.3What anomalies or unusual patterns do you see? Describe no more than 10 anomalies, and prioritize those unusual patterns that you think are most likely to be relevant to the crime.

 

Limit your response to no more than 10 images and 500 words.

 

The data plotted by Visitor’s ID activities in the park in a given time period. The ‘movement’ type data indicated by small blue circles, while the ‘check-in’ types are using a bigger shapes to distinguish it easily. Only the Entrance, Creighton Pavilion, and Grinosaurus Stage shapes are different, as they are the point-of-interest in the investigation. The 3 Entrances are in ‘triangle’, the Creighton Pavilion is in ‘cross’, and Grinosaurus Stage is in ‘asterisk’. When there’s a blank space between activities, it means the visitor is inside the ride or is stationery until the next movement type is recorded.

 

By analysing these graph, we have found 5 anomalies that might related to the crime and/or possible issues in the theme park.

  1. There are some visitors who came inside the Creighton Pavilion before 10 AM, but somehow they never got out of the Pavilion until it reopened again at 11.30 AM and it can be seen that they only left the area by that time and has been hiding inside the Pavilion during closing time. These visitors are most likely the main suspects for the crime.

 

  1. Another case is, assuming the Pavilion closing time is 10 AM, there were a group of people who stayed longer than 10 AM and left at 10.03 AM. This could mean there are some lax of security in the Pavilion as they do not close right on time. These visitors might stay behind and waited until there are less visitors in the Pavilion to do their crime.

  1. Multiple visits to the pavilion which duration spans almost or over 1 hour. While the average time spent for other visitors are around 30 minutes, we could only assume these visitors are suspicious to be going in and out many times, as if to scout the situation outside.

  1. Visitor 1983765 came only to the Pavilion then spent a very long time at Scholtz Express and left the park. His/Her main objective got to do with going to the Pavilion and left after he/she is finished. Suspicious as we couldn’t be certain what had happened during that time.

  1. Some visitors are showing to be going in the Entrance but were not taking any rides. In fact, by checking the physical movements in the map, they are actually taking some of the rides, but the ‘check-in’ records was never registered. We concluded that they had a malfunction device which couldn’t track the rides’ check-in record. Four of these visitors did come to the Pavilion, so they fall under suspicious list.

Another list is for the same case of device malfunctioning, but they never came to the Pavilion, so they are in the safe list. Even so, the case of malfunctioning devices are a significant issues that need to be fixed by the theme park.